Duration modeling of Indian languages Hindi and Telugu
نویسندگان
چکیده
This paper reports a preliminary attempt on data-driven modeling of segmental (phoneme) duration for two Indian languages Hindi and Telugu. Classification and Regression Tree (CART) based data-driven duration modeling for segmental duration prediction is presented. A number of features are proposed and their usefulness and relative contribution in segmental duration prediction is assessed. Objective evaluation of the duration models, by root mean squared prediction error (RMSE) and correlation between actual and predicted durations, is performed. The duration models developed have been implemented in an Indian language Textto-Speech synthesis system [1] being developed within Festival framework [2].
منابع مشابه
Part-of-Speech Tagging and Chunking with Maximum Entropy Model
This paper describes our work on Part-ofspeech tagging (POS) and chunking for Indian Languages, for the SPSAL shared task contest. We use a Maximum Entropy (ME) based statistical model. The tagger makes use of morphological and contextual information of words. Since only a small labeled training set is provided (approximately 21,000 words for all three languages), a ME based approach does not y...
متن کاملThe effects of native language on Indian English sounds and timing patterns
This study explored whether the sound structure of Indian English (IE) varies with the divergent native languages of its speakers or whether it is similar regardless of speakers' native languages. Native Hindi (Indo-Aryan) and Telugu (Dravidian) speakers produced comparable phrases in IE and in their native languages. Naïve and experienced IE listeners were then asked to judge whether different...
متن کاملStatistical Machine Translation for Indian Languages: Mission Hindi 2
This paper presents Centre for Development of Advanced Computing Mumbai’s (CDACM) submission to NLP Tools Contest on Statistical Machine Translation in Indian Languages (ILSMT) 2015 (collocated with ICON 2015). The aim of the contest was to collectively explore the effectiveness of Statistical Machine Translation (SMT) while translating within Indian languages and between English and Indian lan...
متن کاملPart of Speech Tagging and Shallow Parsing of Indian Languages
This paper describes and evaluates shallow parsing of several Indian languages utilizing Conditional Random Field models. We show how performance can be substantially improved by several feature enhancements and improved modeling techniques, including expanding the chunk tag inventory, and separating punctuation from linguistic phrases. We also report results from part of speech tagging of Hind...
متن کاملBidirectional Dependency Parser for Indian Languages
In this paper, we apply bidirectional dependency parsing algorithm for parsing Indian languages such as Hindi, Bangla and Telugu as part of NLP Tools Contest, ICON 2010. The parser builds the dependency tree incrementally with the two operations namely proj and non-proj. The complete dependency tree given by the unlabeled parser is used by SVM (Support Vector Machines) classifier for labeling. ...
متن کامل